Frequency domain correspondence for speaker normalization

نویسندگان

  • Ming Liu
  • Xi Zhou
  • Mark Hasegawa-Johnson
  • Thomas S. Huang
  • Zhengyou Zhang
چکیده

Due to physiology and linguistic difference between speakers, the spectrum pattern for the same phoneme of two speakers can be quite dissimilar. Without appropriate alignment on the frequency axis, the misalignment will reduce the modeling efficiency resutling in performance degradation. In this paper, a novel data-driven framework is proposed to build the alignment of the frequency axes of two speakers. This alignment between two frequency axes is essentially a frequency domain correspondence of the two speakers. To establish the correspondence, we formulate the task as a global optimal matching problem. The local matching of frequency bins is achieved by comparing the local feature of the spectrogram along the frequency bins. The local feature is actually capturing the local pattern in the spectrogram. Given the local matching score, a dynamic programming is then applied to find the optimal correspondence. Experiments on TIMIT corpus and TIDIGITS corpus clearly show the effectiveness of this method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speaker Normalization with All-pass Transforms Center for Language and Speech Processing 72 Speaker Normalization with All-pass Transforms

Speaker normalization is a process in which the short-time features of speech from a given speaker are transformed so as to better match some speaker independent model. Vocal tract length normalization (VTLN) is a popular speaker normalization scheme wherein the frequency axis of the short-time spectrum associated with a particular speaker’s speech is rescaled or warped prior to the extraction ...

متن کامل

Vocal Tract Length Normalization for Large Vocabulary Continuous Speech Recognition

Generally speaking, the speaker-dependence of a speech recognition system stems from speaker-dependent speech feature. The variation of vocal tract length and/or shape is one of the major source of inter-speaker variations. In this paper, we address several methods of vocal tract length normalization (VTLN) for large vocabulary continuous speech recognition: (1) explore the bilinear warping VTL...

متن کامل

Speaker normalization with all-pass transforms

Speaker normalization is a process in which the short-time features of speech from a given speaker are transformed so as to better match some speaker independent model. Vocal tract length normalization (VTLN) is a popular speaker normalization scheme wherein the frequency axis of the short-time spectrum associated with a speaker's speech is rescaled or warped prior to the extraction of cepstral...

متن کامل

Bark-shift based nonlinear speaker normalization using the second subglottal resonance

In this paper, we propose a Bark-scale shift based piecewise nonlinear warping function for speaker normalization, and a joint frequency discontinuity and energy attenuation detection algorithm to estimate the second subglottal resonance (Sg2). We then apply Sg2 for rapid speaker normalization. Experimental results on children’s speech recognition show that the proposed nonlinear warping functi...

متن کامل

A fast approach to psychoacoustic model compensation for robust speaker recognition in additive noise

This paper addresses the problem of speaker verification in the presence of additive noise. We propose a fast implementation of Psychoacoustic Model Compensation (Psy-Comp) scheme for static features along with model domain mean and variance normalization for robust speaker recognition in noisy conditions. The proposed algorithms are validated through experiments on noise corrupted NIST-2000 sp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007